NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Dual-Path Minimum-Phase and All-Pass Decomposition Network for Single Channel Speech Dereverberation

https://doi.org/10.1109/ICASSP48485.2024.10446719

Liu, Xi; Chen, Szu-Jui; Hansen, John_H L (April 2024, IEEE)

With the development of deep neural networks (DNN), many DNN-based speech dereverberation approaches have been proposed to achieve significant improvement over the traditional methods. However, most deep learning-based dereverberation methods solely focus on suppressing time-frequency domain reverberations without utilizing cepstral domain features which are potentially useful for dereverberation. In this paper, we propose a dual-path neural network structure to separately process minimum-phase and all-pass components of single channel speech. First, we decompose speech signal into minimum-phase and all-pass components in cepstral domain, then Conformer embedded U-Net is used to remove reverberations of both components. Finally, we combine these two processed components together to synthesize the enhanced output. The performance of proposed method is tested on REVERB-Challenge evaluation dataset in terms of commonly used objective metrics. Experimental results demonstrate that our method outperforms other compared methods.
more » « less
Full Text Available
Fearless Steps Apollo: Team Communications Based Community Resource Development for Science, Technology, Education, and Historical Preservation

https://doi.org/10.1109/ICASSP48485.2024.10446811

Hansen, John HL; Joglekar, Aditya; Shekar, Meena_M C; Chen, Szu-Jui; Liu, Xi (April 2024, IEEE)

The Fearless Steps Apollo (FS-APOLLO) resource is a collection of 150,000 hours of audio, associated meta-data, and supplemental speech technology infrastructure intended to benefit the (i) speech processing technology, (ii) communication science, team-based psychology, and (iii) education/STEM, history/preservation/archival communities. The FS-APOLLO initiative which started in 2014 has since resulted in the preservation of over 75,000 hours of NASA Apollo Missions audio. Systems created for this audio collection have led to the emergence of several new Speech and Language Technologies (SLT). This paper seeks to provide an overview of the latest advancements in the FS-Apollo effort and explore upcoming strategies in big-data deployment, outreach, and novel avenues of K-12 and STEM education facilitated through this resource.
more » « less
Full Text Available
FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition

https://doi.org/10.21437/Interspeech.2022-10917

Chen, Szu-Jui; Xie, Jiamin; Hansen, John H.L. (September 2022, ISCA INTERSPEECH-2022)

Self-supervised learning representations (SSLR) have resulted in robust features for downstream tasks in many fields. Recently, several SSLRs have shown promising results on automatic speech recognition (ASR) benchmark corpora. However, previous studies have only shown performance for solitary SSLRs as an input feature for ASR models. In this study, we propose to investigate the effectiveness of diverse SSLR combinations using various fusion methods within end-to-end (E2E) ASR models. In addition, we will show there are correlations between these extracted SSLRs. As such, we further propose a feature refinement loss for decorrelation to efficiently combine the set of input features. For evaluation, we show that the proposed “FeaRLESS learning features” perform better than systems without the proposed feature refinement loss for both the WSJ and Fearless Steps Challenge (FSC) corpora.
more » « less
Full Text Available
Scenario Aware Speech Recognition: Advancements for Apollo Fearless Steps & CHiME-4 Corpora

https://doi.org/10.1109/ASRU51503.2021.9688225

Chen, Szu-Jui; Xia, Wei; Hansen, John H.L. (December 2021, IEEE ASRU-2021: Automatic Speech Recognition & Understanding Workshop)

In this study, we propose to investigate triplet loss for the purpose of an alternative feature representation for ASR. We consider a general non-semantic speech representation, which is trained with a self-supervised criteria based on triplet loss called TRILL, for acoustic modeling to represent the acoustic characteristics of each audio. This strategy is then applied to the CHiME-4 corpus and CRSS-UTDallas Fearless Steps Corpus, with emphasis on the 100-hour challenge corpus which consists of 5 selected NASA Apollo-11 channels. An analysis of the extracted embeddings provides the foundation needed to characterize training utterances into distinct groups based on acoustic distinguishing properties. Moreover, we also demonstrate that triplet-loss based embedding performs better than i-Vector in acoustic modeling, confirming that the triplet loss is more effective than a speaker feature. With additional techniques such as pronunciation and silence probability modeling, plus multi-style training, we achieve a +5.42% and +3.18% relative WER improvement for the development and evaluation sets of the Fearless Steps Corpus. To explore generalization, we further test the same technique on the 1 channel track of CHiME-4 and observe a +11.90% relative WER improvement for real test data.
more » « less
Full Text Available

Search for: All records